Label smoothing is a regularization technique widely used in supervised learning to improve the generalization of models on various tasks, such as image classification and machine translation. However, the effectiveness of label smoothing in multi-hop question answering (MHQA) has yet to be well studied. In this paper, we systematically analyze the role of label smoothing on various modules of MHQA and propose F1 smoothing, a novel label smoothing technique specifically designed for machine reading comprehension (MRC) tasks. We evaluate our method on the HotpotQA dataset and demonstrate its superiority over several strong baselines, including models that utilize complex attention mechanisms. Our results suggest that label smoothing can be effective in MHQA, but the choice of smoothing strategy can significantly affect performance.
translated by 谷歌翻译
增强现实(AR)透明视觉是一个有趣的研究主题,因为它使用户能够通过墙壁看到并查看被遮挡的对象。大多数现有研究的重点是透明视觉的视觉效果,而相互作用方法的研究较少。但是,我们认为,使用常见的互动方式,例如,空中点击和语音,可能不是控制透明视觉的最佳方法。这是因为当我们想浏览某些东西时,它与我们的目光深度/狂热有关,因此应由眼睛自然控制。遵循这个想法,本文提出了一种新颖的目光控制(GVC)AR中的透明视觉技术。由于需要凝视深度,因此我们使用两个红外摄像机和相应的算法构建了一个凝视跟踪模块,然后将其组装到Microsoft Hololens 2中,以实现凝视深度估计。然后,我们提出了两种不同的GVC模式,以供透明视觉拟合不同的情况。广泛的实验结果表明,我们的凝视深度估计是有效而准确的。通过与常规互动方式进行比较,我们的GVC技术在效率方面也很出色,用户更喜欢。最后,我们提出了凝视控制的透明视觉的四个示例应用。
translated by 谷歌翻译
医学对话生成是一项重要但具有挑战性的任务。以前的大多数作品都依赖于注意力机制和大规模预处理的语言模型。但是,这些方法通常无法从长时间的对话历史中获取关键信息,从而产生准确和信息丰富的响应,因为医疗实体通常散布在多种话语中以及它们之间的复杂关系。为了减轻此问题,我们提出了一个具有关键信息召回(Medpir)的医疗响应生成模型,该模型建立在两个组件上,即知识吸引的对话图形编码器和召回增强的生成器。知识吸引的对话图编码器通过利用话语中的实体之间的知识关系,并使用图形注意力网络对话图来构建对话图。然后,召回增强的发电机通过在产生实际响应之前生成对话的摘要来增强这些关键信息的使用。两个大型医学对话数据集的实验结果表明,Medpir在BLEU分数和医疗实体F1度量中的表现优于强大的基准。
translated by 谷歌翻译
为了保留用户隐私,在实现移动智能的同时,已经提出了技术来培训有关分散数据的深神经网络。但是,对分散数据的培训使神经体系结构的设计非常困难。在设计和部署异质移​​动平台的不同神经体系结构时,这种困难将进一步扩大。在这项工作中,我们提出了一个自动的神经体系结构搜索,以分散的培训,这是一种新的DNN培训范式,称为联合神经建筑搜索,即Federated Nas。为了应对有限的客户计算和通信资源的主要挑战,我们提出了FedNAS,这是一个高度优化的有效联合NAS的框架。 FedNAS充分利用了在建筑搜索过程中重新训练模型候选人不足的关键机会,并结合了三个关键的优化:对偏见客户培训的平行候选人,早期降低了较不优点的候选人和动态的回合数。在大规模数据集和典型的CNN体​​系结构上测试,FedNAS可以达到可比较的模型精度作为最先进的NAS NAS算法,该算法训练具有集中式数据的模型,并且与直接的直线相比,最多将客户成本降低了两个幅度。联邦NAS的设计。
translated by 谷歌翻译
Model bias triggered by long-tailed data has been widely studied. However, measure based on the number of samples cannot explicate three phenomena simultaneously: (1) Given enough data, the classification performance gain is marginal with additional samples. (2) Classification performance decays precipitously as the number of training samples decreases when there is insufficient data. (3) Model trained on sample-balanced datasets still has different biases for different classes. In this work, we define and quantify the semantic scale of classes, which is used to measure the feature diversity of classes. It is exciting to find experimentally that there is a marginal effect of semantic scale, which perfectly describes the first two phenomena. Further, the quantitative measurement of semantic scale imbalance is proposed, which can accurately reflect model bias on multiple datasets, even on sample-balanced data, revealing a novel perspective for the study of class imbalance. Due to the prevalence of semantic scale imbalance, we propose semantic-scale-balanced learning, including a general loss improvement scheme and a dynamic re-weighting training framework that overcomes the challenge of calculating semantic scales in real-time during iterations. Comprehensive experiments show that dynamic semantic-scale-balanced learning consistently enables the model to perform superiorly on large-scale long-tailed and non-long-tailed natural and medical datasets, which is a good starting point for mitigating the prevalent but unnoticed model bias.
translated by 谷歌翻译
CutMix is a vital augmentation strategy that determines the performance and generalization ability of vision transformers (ViTs). However, the inconsistency between the mixed images and the corresponding labels harms its efficacy. Existing CutMix variants tackle this problem by generating more consistent mixed images or more precise mixed labels, but inevitably introduce heavy training overhead or require extra information, undermining ease of use. To this end, we propose an efficient and effective Self-Motivated image Mixing method (SMMix), which motivates both image and label enhancement by the model under training itself. Specifically, we propose a max-min attention region mixing approach that enriches the attention-focused objects in the mixed images. Then, we introduce a fine-grained label assignment technique that co-trains the output tokens of mixed images with fine-grained supervision. Moreover, we devise a novel feature consistency constraint to align features from mixed and unmixed images. Due to the subtle designs of the self-motivated paradigm, our SMMix is significant in its smaller training overhead and better performance than other CutMix variants. In particular, SMMix improves the accuracy of DeiT-T/S, CaiT-XXS-24/36, and PVT-T/S/M/L by more than +1% on ImageNet-1k. The generalization capability of our method is also demonstrated on downstream tasks and out-of-distribution datasets. Code of this project is available at https://github.com/ChenMnZ/SMMix.
translated by 谷歌翻译
Storytelling and narrative are fundamental to human experience, intertwined with our social and cultural engagement. As such, researchers have long attempted to create systems that can generate stories automatically. In recent years, powered by deep learning and massive data resources, automatic story generation has shown significant advances. However, considerable challenges, like the need for global coherence in generated stories, still hamper generative models from reaching the same storytelling ability as human narrators. To tackle these challenges, many studies seek to inject structured knowledge into the generation process, which is referred to as structure knowledge-enhanced story generation. Incorporating external knowledge can enhance the logical coherence among story events, achieve better knowledge grounding, and alleviate over-generalization and repetition problems in stories. This survey provides the latest and comprehensive review of this research field: (i) we present a systematical taxonomy regarding how existing methods integrate structured knowledge into story generation; (ii) we summarize involved story corpora, structured knowledge datasets, and evaluation metrics; (iii) we give multidimensional insights into the challenges of knowledge-enhanced story generation and cast light on promising directions for future study.
translated by 谷歌翻译
Most shadow removal methods rely on the invasion of training images associated with laborious and lavish shadow region annotations, leading to the increasing popularity of shadow image synthesis. However, the poor performance also stems from these synthesized images since they are often shadow-inauthentic and details-impaired. In this paper, we present a novel generation framework, referred to as HQSS, for high-quality pseudo shadow image synthesis. The given image is first decoupled into a shadow region identity and a non-shadow region identity. HQSS employs a shadow feature encoder and a generator to synthesize pseudo images. Specifically, the encoder extracts the shadow feature of a region identity which is then paired with another region identity to serve as the generator input to synthesize a pseudo image. The pseudo image is expected to have the shadow feature as its input shadow feature and as well as a real-like image detail as its input region identity. To fulfill this goal, we design three learning objectives. When the shadow feature and input region identity are from the same region identity, we propose a self-reconstruction loss that guides the generator to reconstruct an identical pseudo image as its input. When the shadow feature and input region identity are from different identities, we introduce an inter-reconstruction loss and a cycle-reconstruction loss to make sure that shadow characteristics and detail information can be well retained in the synthesized images. Our HQSS is observed to outperform the state-of-the-art methods on ISTD dataset, Video Shadow Removal dataset, and SRD dataset. The code is available at https://github.com/zysxmu/HQSS.
translated by 谷歌翻译
Scene text editing (STE) aims to replace text with the desired one while preserving background and styles of the original text. However, due to the complicated background textures and various text styles, existing methods fall short in generating clear and legible edited text images. In this study, we attribute the poor editing performance to two problems: 1) Implicit decoupling structure. Previous methods of editing the whole image have to learn different translation rules of background and text regions simultaneously. 2) Domain gap. Due to the lack of edited real scene text images, the network can only be well trained on synthetic pairs and performs poorly on real-world images. To handle the above problems, we propose a novel network by MOdifying Scene Text image at strokE Level (MOSTEL). Firstly, we generate stroke guidance maps to explicitly indicate regions to be edited. Different from the implicit one by directly modifying all the pixels at image level, such explicit instructions filter out the distractions from background and guide the network to focus on editing rules of text regions. Secondly, we propose a Semi-supervised Hybrid Learning to train the network with both labeled synthetic images and unpaired real scene text images. Thus, the STE model is adapted to real-world datasets distributions. Moreover, two new datasets (Tamper-Syn2k and Tamper-Scene) are proposed to fill the blank of public evaluation datasets. Extensive experiments demonstrate that our MOSTEL outperforms previous methods both qualitatively and quantitatively. Datasets and code will be available at https://github.com/qqqyd/MOSTEL.
translated by 谷歌翻译
Training labels for graph embedding algorithms could be costly to obtain in many practical scenarios. Active learning (AL) algorithms are very helpful to obtain the most useful labels for training while keeping the total number of label queries under a certain budget. The existing Active Graph Embedding framework proposes to use centrality score, density score, and entropy score to evaluate the value of unlabeled nodes, and it has been shown to be capable of bringing some improvement to the node classification tasks of Graph Convolutional Networks. However, when evaluating the importance of unlabeled nodes, it fails to consider the influence of existing labeled nodes on the value of unlabeled nodes. In other words, given the same unlabeled node, the computed informative score is always the same and is agnostic to the labeled node set. With the aim to address this limitation, in this work, we introduce 3 dissimilarity-based information scores for active learning: feature dissimilarity score (FDS), structure dissimilarity score (SDS), and embedding dissimilarity score (EDS). We find out that those three scores are able to take the influence of the labeled set on the value of unlabeled candidates into consideration, boosting our AL performance. According to experiments, our newly proposed scores boost the classification accuracy by 2.1% on average and are capable of generalizing to different Graph Neural Network architectures.
translated by 谷歌翻译